Skip to content

Conversation

@jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Oct 1, 2025

Summary:
att, we are adding regex support to simplify the config, and enabling the support in both transformers and vllm to make sure regex config works everywhere

torchao PR that adds the functionality to quantize_ API: pytorch/ao#3084
transformer PR: huggingface/transformers#41242

Test Plan:
We save the model with the regex config in transformers, in vllm we just make sure we can load the model:

VLLM_DISABLE_COMPILE_CACHE=1 pytest tests/quantization/test_torchao.py -k test_opt_125m_module_fqn_to_config_regex_model

model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev

Output:

output: [([2, 133, 812, 9, 1470, 16, 5, 812, 9, 5, 1515, 3497, 4, 50118, 50118, 133, 812, 9, 1470, 16, 5, 812, 9, 5, 1515, 3497, 4, 50118, 50118, 133, 812, 9, 1470, 16, 5, 812, 9, 5], 'The capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the French Republic.\n\nThe capital of France is the capital of the')]

Reviewers:

Subscribers:

Tasks:

Tags:

# we'll apply the first matched pattern
c = module_fqn_to_config[maybe_module_fqn_pattern]
break
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some indent error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intended actually, the else branch is executed when the loop finishes (and didn't break out of the loop through break) so we have a default config

@jerryzh168 jerryzh168 force-pushed the torchao-module-fqn-to-config-regex branch 2 times, most recently from 68ce5ac to f041a99 Compare October 3, 2025 23:39
@jerryzh168 jerryzh168 marked this pull request as ready for review October 4, 2025 00:19
@jerryzh168 jerryzh168 force-pushed the torchao-module-fqn-to-config-regex branch 2 times, most recently from 29210b3 to 608696a Compare October 7, 2025 22:34
@jerryzh168
Copy link
Contributor Author

@houseroad this is ready to review btw, we have landed the corresponding PR in torchao: pytorch/ao#3084

@mergify
Copy link

mergify bot commented Oct 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jerryzh168.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 8, 2025
@jerryzh168 jerryzh168 force-pushed the torchao-module-fqn-to-config-regex branch from 608696a to 21383c7 Compare October 9, 2025 00:19
@mergify mergify bot removed the needs-rebase label Oct 9, 2025
…ng regex

Summary:
att, we are adding regex support to simplify the config, and enabling the support in both
transformers and vllm to make sure regex config works everywhere

torchao PR that adds the functionality to quantize_ API: pytorch/ao#3084
transformer PR:

Test Plan:
We save the model with the regex config in transformers, in vllm we just make sure we
can load the model:

pytest tests/quantization/test_torchao.py test_opt_125m_module_fqn_to_config_regex_model_loading_with_params

Reviewers:

Subscribers:

Tasks:

Tags:

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
@jerryzh168 jerryzh168 force-pushed the torchao-module-fqn-to-config-regex branch from 21383c7 to 7c19b0a Compare October 9, 2025 02:22
Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 9, 2025
@houseroad houseroad enabled auto-merge (squash) October 9, 2025 06:24
@houseroad houseroad merged commit a83ff27 into vllm-project:main Oct 9, 2025
54 checks passed
845473182 pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Oct 10, 2025
…to loader

* 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits)
  [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001)
  Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164)
  Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353)
  [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189)
  Bump Flashinfer to v0.4.0 (vllm-project#26326)
  Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464)
  [Core] Relax the LoRA  max rank (vllm-project#26461)
  [CI/Build] Fix model nightly tests (vllm-project#26466)
  [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486)
  [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926)
  [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200)
  [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439)
  [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462)
  [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445)
  [Minor] Change warning->warning_once in preprocess (vllm-project#26455)
  [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392)
  [Misc] Redact ray runtime env before logging (vllm-project#26302)
  Separate MLAAttention class from Attention (vllm-project#25103)
  [Attention] Register FLASHMLA_SPARSE (vllm-project#26441)
  [Kernels] Modular kernel refactor (vllm-project#24812)
  ...
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…#26001)

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…#26001)

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…#26001)

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…#26001)

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…#26001)

Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants